Approximate Furthest Neighbor in High Dimensions
نویسندگان
چکیده
Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries. We present a simple, fast, and highly practical data structure for answering AFN queries in high-dimensional Euclidean space. We build on the technique of Indyk (SODA 2003), storing random projections to provide sublinear query time for AFN. However, we introduce a di↵erent query algorithm, improving on Indyk’s approximation factor and reducing the running time by a logarithmic factor. We also present a variation based on a queryindependent ordering of the database points; while this does not have the provable approximation factor of the query-dependent data structure, it o↵ers significant improvement in time and space complexity. We give a theoretical analysis, and experimental results.
منابع مشابه
Fast approximate furthest neighbors with data-dependent hashing
We present a novel hashing strategy for approximate furthest neighbor search that selects projection bases using the data distribution. This strategy leads to an algorithm, which we call DrusillaHash, that is able to outperform existing approximate furthest neighbor strategies. Our strategy is motivated by an empirical study of the behavior of the furthest neighbor search problem, which lends i...
متن کاملApproximate Furthest Neighbor with Application to Annulus Query
Much recent work has been devoted to approximate nearest neighbor queries. Motivated by applications in recommender systems, we consider approximate furthest neighbor (AFN) queries and present a simple, fast, and highly practical data structure for answering AFN queries in high-dimensional Euclidean space. The method builds on the technique of Indyk (SODA 2003), storing random projections to pr...
متن کاملWhen Crossings Count — Approximating the Minimum
We present an (1+ε)-approximation algorithm for computing the minimum-spanning tree of points in a planar arrangement of lines, where the metric is the number of crossings between the spanning tree and the lines. The expected running time of the algorithm is near linear. We also show how to embed such a crossing metric of hyperplanes in d-dimensions, in subquadratic time, into high-dimensions s...
متن کاملNearest Neighbor Search using Kd-trees
We suggest a simple modification to the kd-tree search algorithm for nearest neighbor search resulting in an improved performance. The Kd-tree data structure seems to work well in finding nearest neighbors in low dimensions but its performance degrades even if the number of dimensions increases to more than three. Since the exact nearest neighbor search problem suffers from the curse of dimensi...
متن کاملHardness of String Similarity Search and Other Indexing Problems
Similarity search is a fundamental problem in computer science. Given a set of points from a universe and a distance measure , it is possible to pose similarity search queries on a point in the form of nearest neighbors (find the string that has the smallest edit distance to a query string) or in the form of furthest neighbors (find the string that has the longest common subsequence with a quer...
متن کامل